Automatic Profiling of MPI Applications with Hardware Performance Counters
نویسنده
چکیده
This paper presents an automatic counter instrumentation and profiling module added to the MPI library on Cray T3E and SGI Origin2000 systems. A detailed summary of the hardware performance counters and the MPI calls of any MPI production program is gathered during execution and written in MPI_Finalize on a special syslog file. The user can get the same information in a different file. Statistical summaries are computed weekly and monthly. The paper describes experiences with this library on the Cray T3E systems at HLRS Stuttgart and TU Dresden. It focuses on the problems integrating the hardware performance counters into MPI counter profiling and presents first results with these counters. Also, a second software design is described that allows the integration of the profiling layer into a dynamic shared object MPI library without consuming the user's PMPI profiling interface.
منابع مشابه
Automatic Pro ling of MPI Applications with Hardware Performance Counters
This paper presents an automatic counter instrumentation and pro ling module added to the MPI library on Cray T3E and SGI Origin2000 systems. A detailed summary of the hardware performance counters and the MPI calls of any MPI production program is gathered during execution and written in MPI Finalize on a special syslog le. The user can get the same information in a di erent le. Statistical su...
متن کاملFLEX-MPI: An MPI Extension for Supporting Dynamic Load Balancing on Heterogeneous Non-dedicated Systems
This paper introduces FLEX-MPI, a novel runtime approach for the dynamic load balancing of MPI-based SPMD applications running on heterogeneous platforms in the presence of dynamic external loads. To effectively balance the workload, FLEX-MPI monitors the actual performance of applications via hardware counters and the MPI profiling interface—with a negligible overhead and minimal code modifica...
متن کاملA Framework for Comparative Performance Analysis of MPI Applications
Parallel application developers are facing a myriad of parameters when trying to understand the performance behavior of their code. Even within a single hardware configuration, the performance of any application will depend among others on factors such as the MPI library or some application level input parameters. This paper deals with the problem on how to determine the cause for performance v...
متن کاملAutomatic Monitoring of Memory Hierarchies in Threaded Applications with AMEBA
In this paper we present an approach to online automatic monitoring of memory hierarchies in threaded applications. Our environment consists of a monitoring system and an automatic performance analysis tool. The EPC monitoring system, uses static instrumentation of the source code and information from the hardware counters to generate performance data for selected code regions and data structur...
متن کاملPerformance Analysis and Optimization of a Hybrid Seismic Imaging Application
Applications to process seismic data are computationally expensive and, therefore, employ scalable parallel systems to produce timely results. Here we describe our experiences of using performance analysis tools to gain insight into an MPI+OpenMP code developed by Shell that performs Reverse Time Migration on a cluster to produce models of the subsurface. Tuning MPI+OpenMP programs for modern p...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1999